Cache Oblivious Matrix Operations Using Peano Curves

نویسندگان

  • Michael Bader
  • Christian E. Mayer
چکیده

Algorithms are called cache oblivious, if they are designed to benefit from any kind of cache hierarchy—regardless of its size or number of cache levels. In linear algebra computations, block recursive approaches are a common approach that, by construction, lead to inherently local data access pattern, and thus to an overall good cache performance[3]. In this article, we present block recursive approaches that use an element ordering based on a Peano space filling curve to store the matrix elements. We present algorithms for matrix multiplication and LU decomposition, which are able to minimize the number of cache misses on any cache level. 1 A block recursive scheme for matrix multiplication Consider the multiplication of two 3× 3-matrices, such as given in equation (1), where the indices of the matrix elements indicate the order in which the elements are stored in memory. a0 a5 a6 a1 a4 a7 a2 a3 a8  } {{ } =: A  b0 b5 b6 b1 b4 b7 b2 b3 b8  } {{ } =: B =  c0 c5 c6 c1 c4 c7 c2 c3 c8  . } {{ } =: C (1) The scheme is similar to a column-major ordering, however, the order of the even-numbered columns have been inverted, which leads to a meandering scheme, which is also equivalent to the basic pattern of a Peano space filling curve. Now, if we examine the operations to compute the elements cr of the result matrix, we note that the operations can be executed in a very convenient order – from each operation to the next, an element is either reused or one of its direct neighbours in memory is accessed:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cache Oblivious Dense and Sparse Matrix Multiplication Based on Peano Curves

Cache oblivious algorithms are designed to benefit from any existing cache hierarchy—regardless of cache size or architecture. In matrix computations, cache oblivious approaches are usually obtained from block-recursive approaches. In this article, we extend an existing cache oblivious approach for matrix operations, which is based on Peano space-filling curves, for multiplication of sparse and...

متن کامل

Cache oblivious matrix multiplication using an element ordering based on the Peano curve

One of the keys to tap the full performance potential of current hardware is the optimal utilisation of cache memory. Cache oblivious algorithms are designed to inherently benefit from any underlying hierarchy of caches, but do not need to know about the exact structure of the cache. In this paper, we present a cache oblivious algorithm for matrix multiplication. The algorithm uses a block recu...

متن کامل

Comparative study of space filling curves for cache oblivious TU Decomposition

We examine several matrix layouts based on space-filling curves that allow for a cache-oblivious adaptation of parallel TU decomposition for rectangular matrices over finite fields. The TU algorithm of [11] requires index conversion routines for which the cost to encode and decode the chosen curve is significant. Using a detailed analysis of the number of bit operations required for the encodin...

متن کامل

Cache Oblivious Matrix Transposition: Simulation and Experiment

A cache oblivious matrix transposition algorithm is implemented and analyzed using simulation and hardware performance counters. Contrary to its name, the cache oblivious matrix transposition algorithm is found to exhibit a complex cache behavior with a cache miss ratio that is strongly dependent on the associativity of the cache. In some circumstances the cache behavior is found to be worst th...

متن کامل

Cache-oblivious Algorithms Cache-oblivious Algorithms Acknowledgments

This thesis presents “cache-oblivious” algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cache-line length need to be tuned to minimize the number of cache misses. We show that the ordinary...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006